Search CORE

5 research outputs found

PCA and K-Means decipher genome

Author: A Zinovyev
AN Gorban
AN Gorban
AN Gorban
AY Zinovyev
FHC Crick
HY Ou
J Jackson
R Staden
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

In this paper, we aim to give a tutorial for undergraduate students studying statistical methods and/or bioinformatics. The students will learn how data visualization can help in genomic sequence analysis. Students start with a fragment of genetic text of a bacterial genome and analyze its structure. By means of principal component analysis they ``discover'' that the information in the genome is encoded by non-overlapping triplets. Next, they learn how to find gene positions. This exercise on PCA and K-Means clustering enables active study of the basic bioinformatics notions. Appendix 1 contains program listings that go along with this exercise. Appendix 2 includes 2D PCA plots of triplet usage in moving frame for a series of bacterial genomes from GC-poor to GC-rich ones. Animated 3D PCA plots are attached as separate gif files. Topology (cluster structure) and geometry (mutual positions of clusters) of these plots depends clearly on GC-content.Comment: 18 pages, with program listings for MatLab, PCA analysis of genomes and additional animated 3D PCA plot

arXiv.org e-Print Archive

CiteSeerX

Crossref

PCA Beyond The Concept of Manifolds: Principal Trees, Metro Maps, and Elastic Cubic Complexes

Author: A Gorban
A Gusev
A Zinovyev
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AY Zinovyev
B Kégl
CM Bishop
E Erwin
F Mulier
FHC Crick
H Ritter
J Einbeck
K Pearson
M Löwe
M Nagl
R Shyamsundar
S Matveev
T Hastie
T Kohonen
TM Martinetz
VA Dergachev
YF Leung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/12/2007
Field of study

Multidimensional data distributions can have complex topologies and variable local dimensions. To approximate complex data, we propose a new type of low-dimensional ``principal object'': a principal cubic complex. This complex is a generalization of linear and non-linear principal manifolds and includes them as a particular case. To construct such an object, we combine a method of topological grammars with the minimization of an elastic energy defined for its embedment into multidimensional data space. The whole complex is presented as a system of nodes and springs and as a product of one-dimensional continua (represented by graphs), and the grammars describe how these continua transform during the process of optimal complex construction. The simplest case of a topological grammar (``add a node'', ``bisect an edge'') is equivalent to the construction of ``principal trees'', an object useful in many practical applications. We demonstrate how it can be applied to the analysis of bacterial genomes and for visualization of cDNA microarray data using the ``metro map'' representation. The preprint is supplemented by animation: ``How the topological grammar constructs branching principal components (AnimatedBranchingPCA.gif)''.Comment: 19 pages, 8 figure

arXiv.org e-Print Archive

Crossref

Elastic Maps and Nets for Approximating Principal Manifolds and Their Application to Microarray Data Visualization

Author: A Gorban
A Gorban
A Gusev
A Zinovyev
A. N. Gorban
AJ Smola
AJ Smola
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AY Zinovyev
AY Zinovyev
B Kégl
B Kégl
B Mirkin
B Schölkopf
CM Bishop
CM Perou
D Stanford
DG Kendall
E Erwin
F Mulier
H Ritter
H Yin
H Yin
H Zou
JB Tenenbaum
JD Banfield
K Pearson
Kégl
L Aizenberg
L Dyrskjot
M Born
M Frećhet
M LeBlanc
M Oja
R Durbin
R Sayle
R Shyamsundar
S Kaski
S Roweis
T Hastie
T Hastie
T Kohonen
VA Dergachev
W Cai
Y Wang
YF Leung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/12/2007
Field of study

Principal manifolds are defined as lines or surfaces passing through ``the middle'' of data distribution. Linear principal manifolds (Principal Components Analysis) are routinely used for dimension reduction, noise filtering and data visualization. Recently, methods for constructing non-linear principal manifolds were proposed, including our elastic maps approach which is based on a physical analogy with elastic membranes. We have developed a general geometric framework for constructing ``principal objects'' of various dimensions and topologies with the simplest quadratic form of the smoothness penalty which allows very effective parallel implementations. Our approach is implemented in three programming languages (C++, Java and Delphi) with two graphical user interfaces (VidaExpert http://bioinfo.curie.fr/projects/vidaexpert and ViMiDa http://bioinfo-out.curie.fr/projects/vimida applications). In this paper we overview the method of elastic maps and present in detail one of its major applications: the visualization of microarray data in bioinformatics. We show that the method of elastic maps outperforms linear PCA in terms of data approximation, representation of between-point distance structure, preservation of local point neighborhood and representing point classes in low-dimensional spaces.Comment: 35 pages 10 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Mathematical Modelling of Cell-Fate Decision in Response to Death Receptor Engagement

Author: A Degterev
A Jurewicz
A Kawahara
A Krikos
A Lissat
A Naldi
A Naldi
A Zinovyev
AJ Kowaltowski
Andrei Zinovyev
AY Andreyev
B Zhivotovsky
BB Aldridge
Boris Zhivotovsky
BP Eckelman
C Du
C Kitanaka
CG Pham
CM Croce
CY Wang
D Hanahan
Denis Thieffry
DJ Turner
DV Krysko
E Varfolomeev
E Varfolomeev
E Varfolomeev
EA Slee
EE Varfolomeev
Emmanuel Barillot
G Kroemer
G Kroemer
GM Fimia
H Harlin
H Kamata
H LeBlanc
H Yoshida
I Imoto
I Imoto
IN Lavrik
J Hitomi
J Zhang
JE Chipuk
JE Vince
L Tournier
L Tournier
L Yu
Laurence Calzone
Laurent Tournier
M Bentele
M Fussenegger
M Karin
M Karin
M Li
M Rehm
MA Kelliher
MJ Morgan
N Harper
N Holler
N Rampino
N Shivapurkar
P Li
P Vandenabeele
PH Krammer
R Thomas
R Zhang
Rama Ranganathan
RS Moubarak
RW Johnstone
S Daniel
S Kreuz
S Kreuz
S Legewie
S Orrenius
S Sakon
SA Kauffman
SD Catz
Simon Fourquet
T Eissing
T Teitz
T Vanden Berghe
V Cowling
W Fiers
WC Yeh
WC Yeh
Y Kouroku
Y Xu
Z Dai
ZG Liu
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Cytokines such as TNF and FASL can trigger death or survival depending on cell lines and cellular conditions. The mechanistic details of how a cell chooses among these cell fates are still unclear. The understanding of these processes is important since they are altered in many diseases, including cancer and AIDS. Using a discrete modelling formalism, we present a mathematical model of cell fate decision recapitulating and integrating the most consistent facts extracted from the literature. This model provides a generic high-level view of the interplays between NFκB pro-survival pathway, RIP1-dependent necrosis, and the apoptosis pathway in response to death receptor-mediated signals. Wild type simulations demonstrate robust segregation of cellular responses to receptor engagement. Model simulations recapitulate documented phenotypes of protein knockdowns and enable the prediction of the effects of novel knockdowns. In silico experiments simulate the outcomes following ligand removal at different stages, and suggest experimental approaches to further validate and specialise the model for particular cell types. We also propose a reduced conceptual model implementing the logic of the decision process. This analysis gives specific predictions regarding cross-talks between the three pathways, as well as the transient role of RIP1 protein in necrosis, and confirms the phenotypes of novel perturbations. Our wild type and mutant simulations provide novel insights to restore apoptosis in defective cells. The model analysis expands our understanding of how cell fate decision is made. Moreover, our current model can be used to assess contradictory or controversial data from the literature. Ultimately, it constitutes a valuable reasoning tool to delineate novel experiments

CiteSeerX

Crossref

HAL AMU

Directory of Open Access Journals

INRIA a CCSD electronic archive server

How Deep Should be the Depth of Convolutional Neural Networks: a Backyard Dog Case Study

Author: Alexander N. Gorban
AN Gorban
AN Gorban
AY Zinovyev
D White
Evgeny M. Mirkes
G Zhong
I Tyukin
Ivan Y. Tyukin
L Huiying
R Ranjan
Xiao Sun
Y Koren
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref